Noise or additional information? Leveraging crowdsource annotation item agreement for natural language tasks

نویسندگان

  • Emily Jamison
  • Iryna Gurevych
چکیده

In order to reduce noise in training data, most natural language crowdsourcing annotation tasks gather redundant labels and aggregate them into an integrated label, which is provided to the classifier. However, aggregation discards potentially useful information from linguistically ambiguous instances. For five natural language tasks, we pass item agreement on to the task classifier via soft labeling and low-agreement filtering of the training dataset. We find a statistically significant benefit from low item agreement training filtering in four of our five tasks, and no systematic benefit from soft labeling.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Needle in a Haystack: Reducing the Costs of Annotating Rare-Class Instances in Imbalanced Datasets

Crowdsourced data annotation is noisier than annotation from trained workers. Previous work has shown that redundant annotations can eliminate the agreement gap between crowdsource workers and trained workers. Redundant annotation is usually nonproblematic because individual crowdsource judgments are inconsequentially cheap in a class-balanced dataset. However, redundant annotation on classimba...

متن کامل

An Event Factuality Annotation Proposal for Basque

Factuality information gives evidence on whether the events in texts have happened. This information can be relevant in natural language processing tasks such as timeline generation as it helps discriminating the events that are relevant to a certain timeline. We analysed some factuality annotation schemes and proposed a new scheme that aims at concise and easy annotation. We worked on a Basque...

متن کامل

Hotspotting - A Probabilistic Graphical Model For Image Object Localization Through Crowdsourcing

Object localization is an image annotation task which consists of finding the location of a target object in an image. It is common to crowdsource annotation tasks and aggregate responses to estimate the true annotation. While for other kinds of annotations consensus is simple and powerful, it cannot be applied to object localization as effectively due to the task’s rich answer space and inhere...

متن کامل

Frame Semantics Annotation Made Easy with DBpedia

Crowdsourcing techniques applied to natural language processing have recently experienced a steady growth and represent a cheap and fast, albeit valid, solution to create benchmarks and training data. Nevertheless, some particularly complex tasks such as semantic role annotation have been rarely conducted in a crowdsourcing environment, due to their intrinsic difficulty. In this paper, we prese...

متن کامل

Annotation Adaptation and Language Adaptation in NLP

Adaptation technologies are always useful in NLP when there is discrepancy between the training scenario and use scenario. They are also effective in alleviating the data scarcity problem. Domain adaptation is the most popular kind of adaptation technologies and is intensively researched. In this talk we will introduce two other kinds of adaptation technologies: annotation adaptation and langua...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015